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Introduction 


Scarcity of hydrocarbon resources and high exploration risks motivate the development of high fidelity 
algorithms and computationally viable approaches to exploratory geophysics. Whereas early approaches 
considered least-squares minimization, recent developments have emphasized the importance of robust 
formulations, as well as formulations that allow disciplined encoding of prior information into the in¬ 
verse problem formulation. The cost of a more flexible optimization framework is a greater computa¬ 
tional complexity, as least-squares optimization can be performed using straightforward methods (e.g., 
steepest descent, Gauss-Newton, L-BFGS), whilst incorporation of robust (non-smooth) penalties re¬ 
quires custom changes that may be difficult to implement in the context of a general seismic inversion 
workflow. In this study, we propose a generic, flexible optimization framework capable of incorporating 
a broad range of noise models, forward models, regularizes, and reparametrization transforms. This 
framework covers seamlessly robust noise models (such as Huber and Student’s t), as well as sparse 
regularizes, projected constraints, and Total Variation regularization. The proposed framework is also 
expandable — we explain the adjustments that are required for any new formulation to be included. 
Lastly, we conclude with few numerical examples demonstrating the versatility of the formulation. 

Method and Theory 

Classic Waveform Inversion 


The canonical waveform inversion problem is to find the medium parameters for which the modeled 
data matches the recorded data in a least-squares sense (Tarantola 1984). For linear wave equation 

H{m)u = q , (1) 


where H{m) is the discretized wave equation of model parameters m, u is the discretized wavefields and 
q are the discretized source functions. The data are then given by sampling the wavefield at the receiver 
locations: d = Su + e, where e stands for model misspecification and measurement noise. For example, 
for the simplest case of constant-density acoustics in the frequency domain, the wave equation is rep¬ 
resented by the Helmholtz operator H(m ) = ( 0 2 m + V 2 amended with appropriate boundary conditions, 
with squared slowness m. 


The classic inverse problem can be cast as an explicit least-squares problem as follows: 

min (f>(m) =Y,\\SH(my l Q-D\\ 2 F , (2) 

m CO S -V- ' 

h(m) 

where D = d\ ,dj,-- . df corresponds to the recorded data vectors and Q = [q\ ,<72, - - -, r/.v] are the cor¬ 
responding source functions. || • ||f denotes the Frobenius norm which is defined as ||A||/r = \J'Lij a jj- 

Generalized Formulations 


To accommodate robust formulations, regularizes and constraints (e.g., Eq.[4]), we consider 

min 0(y) =£p(/?(Cy),D)+fi(y), (3) 

y m 

where p is a general misfit penalty, and R is a regularization function, and C is a linear transformation 
to a space of interest (e.g. Fourier, wavelet, or curvelet). Thus, the classical inversion expression (Eq.[2]) 
is a specific case where one set R = 0, p as pis(D.D) = ||D — D\\ F , and C = I (the identity). 
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We provide two notable examples of R. In the I st example, we encode constraints using an indicator 
function R that is infinite valued. Let B be a closed set of feasible values of the parameters y, and consider 


R(y) 


0 if y G B 
oo if y B 


(4) 


The regularizer R in this case is equivalent to a constraint y £ B. but we choose to express it as in 
Equation [3] because it offers a simple and uniform way to describe our algorithmic framework. 

For the second example, take R to be Ri asso (y ) = A ||y|| i. This is a sparsifying penalty on the transform 
space coefficients y; the larger the A, the faster elements of y are driven to 0. In this case, R is finite 
valued, but not smooth. We will also use i\ constraints , R^-ball defined by Eq.[4]with B(y) = {y : ||y||i < 
t} for some parameter t. Projection onto this set can be performed in linear time. 


The algorithmic framework we propose requires two assumptions: 


1. The misfit penalty p is differentiable. The differentiability assumption holds for least-squares, 
Huber ( Guitton and Symesj 2003| ), hybrid ( Bube and Langan} 1997), Student’s t (Aravkin et al. 


20121, generalized self-tuning extensions ([Aravkin and van Leeuwen 2012 1 amongst others. 


2. We require that the following problem can be solved quickly (e.g., linear time in length of y)\ 

8 '■= argmin-||g —y|| 2 + R(g). 
g z 


The vector g is known as the proximity, or prox, operator of R applied to y, written as prox s (y). 


The two examples we developed above satisfy the 2 nd requirement for the R. For Eq. |dj the prox 
operator is simply projection onto a set, which is often known in closed form. For the non-smooth Ri aS so 
the prox can also be implemented entry-wise, and is simply given by the soft thresholding operator 
(Si {y))i = sign(y,j • max(0, |y,j — A). While a closed form formula may not be available for all cases, as 
long as prox R (y) can be efficiently computed, our algorithmic framework applies. 

Algorithmic Framework 


To develop our algorithmic framework, we adopt and expand on the PQN algorithm of Schmidt et al. 


(2009). This algorithm was developed for optimization of costly objective functions with simple con¬ 
straints. Such settings are well suited to full-waveform inversion, since function evaluations require 
multiple expensive forward and adjoint PDE solves. PQN builds a quasi-Newton Hessian approxima¬ 


tion for the objective from gradient information using a limited memory BFGS scheme (Nocedal and 
Wright \999\, and then solve an auxiliary subproblem 


nunf?(y)+/?(y) 


where Q is a limited memory-based convex quadratic, and R represents an indicator function for simple 
constraints (such as Eq.|4|. The subproblem is solved by the spectral projected gradient method (Birgin] 
et al. 2000), which relies upon assumption 2 we imposed in our framework, since prox is equivalent to 
projection when R encodes constraints. 


In this study we expand this framework, allowing any function R admitting an efficient proximity opera¬ 
tor. In particular, we can then naturally incorporate sparsity promoting regularization, and total variation. 
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Figure 1: Gaussian (dashed black line), Laplace (dash-dotted red line), and Student’s t (solid blue 
line); Densities (left plot), Negative Log Likelihoods (center plot), and Influence Functions (right plot). 
Student’s /-density has heavy tails, a non-convex log-likelihood, and a re-descending influence function. 


Formulation Examples 

Robust Penalties 

In statistics, robust penalties are used since they penalize large residuals less than the classic quadratic 
norm. The Huber penalty is quadratic at the origin and becomes linear for large values of the param¬ 
eter. Student’s t based penalties are sublinear and have been shown to be extremely robust against 
outlier noise in the seismic context (Aravkin et al. 2012[ ). Practical implementation simply requires 
replacement of the least-squares penalty by one of the form £, log(v + r?), where the sum runs across 
a residual vector. The parameter v can be tuned by cross validation, or using a generalized statistical 
formulation (Aravkin and van Lccuwcn[|201 2). 


Sparse Regularization and Constraints 


A great deal of work has underlined the importance of sparse regularization, both theoretically (Donoho 


2006 Candes and Tao 20061 and in practice in application to seismic problems (see e.g., Herrmann et al. 
(2012) and references therein). Curvelets (Demanet, |2006 1 have been a particularly popular choice of 
transform space, though recent work also advocates for adaptive learning of dictionaries (Horesh and 
Haber 20091. Once an efficient transform is found, l\ - norm solvers offer improved recovery of the pa¬ 
rameters of interest. The non-smoothness of the regularizer is a key feature and cannot be circumvented. 

Results 

Spectral Elements Time Domain 2D Model 

High-order spectral elements in the time domain over a 2D grid is used for simulation, with a test 
model interpolated from the SEG/EAGE salt dome model using 4 sources and 121 receivers. The ti 
“discrepancy” is in comparison with the test model, using the Frobenius norm, so for regularized models, 
it is not necessarily expected to go to zero since the test model may not be the optimal solution. The I st 
case employs the traditional least-squares loss function (Fig. 2a). The 2 nd uses least-squares loss with 
i\ penalty (lasso) (Fig. 2b t. The 3 rd test case is Student’s t loss function, pstudent-, with no regularization 
functional R (Fig. 2d). In the 4 th test case, we work in the curvelet domain and use Ri aS so- The objective 


is reduced by 5 and 7 orders of magnitude, resp. (for similar initial point). 

Finite Difference Frequency Domain 3D Model 

Forward simulations are performed in the frequency domain on a 3D grid. An interpolated version of 
the SEG/EAGE model served as the true model. Data are generated using 6 sources and 60 receivers. 
To validate the code, a standard least-squares inversion is performed, i.e., p = Pls, R = 0, C = I. The 
objective decreases by 2 orders of magnitude within 6 iterations (results not shown). The 2 nd experiment 
uses the same objective pis (and C = I) but adds a constraint R that bounds the model to be inside the 
£i ball of radius t. t can be either set empirically or via cross-validation on a corpus of models. The 
initial iterate is set to the standard least-squares solution, so further improvement in the objective value 
shown on a linear scale in Fig. 2c is small. The 3 rd experiment is a variant of the second that replaces 
the l\ ball with an i\ penalty of strength A, where A is chosen empirically but could be again tuned 
via cross-validation. The original PQN algorithm does not handle this case since R is not an indicator 
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Figure 2: Convergence plots, for various R, p ,C. C = I except for (Jcb 


function, so we modify the algorithm. In particular, we use the “Projected Scaled Sub-gradient + Active 
Set” solver (Schmidt et al. 2007) as the sub-problem solver. The convergence of pLs(h(y),D) is also 
shown in Fig. 2c in blue (but note that we optimized pis(h(y),D) + A||y||i ). Finally, we switch from 
Pls to the Student’s t based loss Pstudent-; • We show results for R = 0 only (Fig. [2T| , but we also have 
results for combining with l\ ball and i\ penalty terms as done for least-squares. The left-axis of the 
plot shows the least-squares residual, which decreases, even though this objective was not optimized. 


Conclusions 


We have presented a general algorithmic framework for large-scale FWI that offers flexible incorporation 

of smooth objectives, as well as various forms of regularization. The framework requires only that the 

proximity operator of the regularization be easily computable, which includes projections onto simple 

sets, as well as sparsity regularization. 
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